A New Generalized Error Path Algorithm for Model Selection
نویسندگان
چکیده
Model selection with cross validation (CV) is very popular in machine learning. However, CV with grid and other common search strategies cannot guarantee to find the model with minimum CV error, which is often the ultimate goal of model selection. Recently, various solution path algorithms have been proposed for several important learning algorithms including support vector classification, Lasso, and so on. However, they still do not guarantee to find the model with minimum CV error. In this paper, we first show that the solution paths produced by various algorithms have the property of piecewise linearity. Then, we prove that a large class of error (or loss) functions are piecewise constant, linear, or quadratic w.r.t. the regularization parameter, based on the solution path. Finally, we propose a new generalized error path algorithm (GEP), and prove that it will find the model with minimum CV error for the entire range of the regularization parameter. The experimental results on a variety of datasets not only confirm our theoretical findings, but also show that the best model with our GEP has better generalization error on the test data, compared to the grid search, manual search, and random search.
منابع مشابه
Row/Column-First: A Path-based Multicast Algorithm for 2D Mesh-based Network on Chips
In this paper, we propose a new path-based multicast algorithm that is called Row/Column-First algorithm. The proposed algorithm constructs a set of multicast paths to deliver a multicast message to all multicast destination nodes. The set of multicast paths are all of row-first or column-first subcategories to maximize the multicast performance. The selection of row-first or column-first appro...
متن کاملAn Improved Hybrid Model with Automated Lag Selection to Forecast Stock Market
Objective: In general, financial time series such as stock indexes have nonlinear, mutable and noisy behavior. Structural and statistical models and machine learning-based models are often unable to accurately predict series with such a behavior. Accordingly, the aim of the present study is to present a new hybrid model using the advantages of the GMDH method and Non-dominated Sorting Genetic A...
متن کاملThe Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models
Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...
متن کاملPixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins
Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...
متن کاملPixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins
Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...
متن کامل